LIM et al: RELIABILITY DISTRIBUTIONS OF TRUNCATED MAX-LOG-MAP (MLM) DECODER APPLIED TO ISI CHANNELS 1 

Reliability Distributions of Truncated Max-log-map 
(MLM) Detectors Applied to Binary ISI Channels 

Fabian Lim and Aleksandar Kavcic 



o 

< 






> 

en 

p 

o 
o 



X 



Abstract — The max-log-map (MLM) receiver is an approx- 
imated version of the well-known, Bahl-Cocke-Jelinek-Raviv 
(BCJR) algorithm. The MLM algorithm is attractive due to its 
implementation simplicity. In practice, sliding-window implemen- 
tations are preferred; these practical implementations consider 
truncated signaling neighborhoods around each transmission 
time instant. In this paper, we consider the binary signaling 
case. We consider sliding-window MLM receivers, where for any 
integer m, the MLM detector is truncated to a length-m signaling 
neighborhood. For any number n of chosen times instants, we 
derive exact expressions for both i) the joint distribution of the 
MLM symbol reliabilities, and ii) the joint probability of the 
erroneous MLM symbol detections. 

We show that the obtained expressions can be efficiently 
evaluated using Monte-Carlo techniques. Our proposed method 
is efficient; the most computationally expensive operation (in each 
Monte-Carlo trial) is an eigenvalue decomposition of a size 2mn 
by 2mn matrix. Finally, our proposed method handles various 
scenarios such as correlated noise distributions, modulation 
coding, etc. 

Index Terms — detection, intersymbol inteference, max-log- 
map, probability distribution, reliability 



I. Introduction 

The intersymbol interefence (ISI) channel has been widely 
studied in communication theory. In optimal detection schemes 
for the ISI channel, input-output sequences, rather than indi- 
vidual symbols, have to be considered [1]. Sequence detectors 
such as the Viterbi detector, only compute hard decisions [2]. 
On the other hand, modern coding techniques require detection 
schemes that also compute symbol reliabilities (also known 
as soft-outputs, log-likelihood ratios, etc.,) [3], [4], [5]. Some 
commonly cited detectors that perform this task, include the 
soft-output Viterbi algorithm (SOVA) [6], the Bahl-Cocke- 
Jelinek-Raviv (BCJR) algorithm [7], and the max-log-map 
(MLM) detector [8]. These detectors have been in use for 
some time, however there is scarce literature on their analysis. 
That being said, it appears there has been recent interest in 
the analysis of the MLM detector. The marginal symbol error 
probability has been derived for a 2-state convolutional code 
in [9]; this has been further extended for convolutional codes 
with constraint length two in [10]. Also, approximations for 
the MLM reliability distributions are obtained in [11], [12]. 

In this paper, we consider the MLM receiver applied, using 
binary signaling, to an intersymbol interference (ISI) channel. 
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State 
[At-i+i,At-c+2,--- ,M'^ 



Channel output Zt = At-e ■ he + At-e+i ■ he-i 
+ --- + Afho -Wt 

Fig. 1. Time evolution of the channel states. Given the state at time t — 1, 
the channel input At determines the new state at time t. The channel output 
Zt clearly depends on the two neighboring states. 

In particular we consider its sliding-window implementation. A 
MLM receiver is termed to be m-truncated, if it only considers 
a signaling window of length m around the time instant of 
interest. The analysis of rn-truncated MLM receivers is shown 
to be tractable, in which for any number n of chosen time 
instants, we derive exact, closed-form expressions for both i) 
the joint distribution of the symbol reliabilities, and ii) the 
joint probability that the detected symbols are in error While 
past work considered only marginal distributions, we provide 
analytic expressions for joint MLM receiver statistics. Our 
derivation is simple; and follows from a simple observation. 
Notation: Deterministic quantities are denoted as follows. 
Bold fonts are used to distinguish both vectors and matrices 
(e.g. denoted a and A, respectively) from scalar quantities 
(e.g. denoted a). Next, random quantities are denoted as 
follows. Scalars are denoted using upper-case italics (e.g. 
denoted A) and vectors denoted using upper-case bold italics 
(e.g. denoted A). Note that we do not reserve specific notation 
for random matrices. Throughout the paper both t and t are 
used to denote time indices. Sets are denoted using curly 
braces, e.g. {oi, 02, 03, • • • }. Also, both a and (3 are used for 
auxiliary notation as needed. Finally, the maximization over 
the components of the size-n vector a = [oi, 02, • • • , a„]"^, 
may be written either explicitly as max^gj^i 2,... .«} a^, or 
concisely as max a. Events are denoted in curly brackets, e.g. 
{A < a} is the event where A is at most a. The probability 
of the event {A < a} is denoted Pr{A<a}. The letter 
F is reserved to denote probability cumulative distribution 
functions, i.e. ^^(a) = Pr {A < a}. The expectation of A 
is denoted as E{A}. 

II. The MLM Algorithm 

A random sequence of symbols drawn from the set 
{—1, 1}, denoted as • • • , A_2, A^i, Aq, Ai,A2, ■ ■ ■ , is trans- 
mitted across the ISI channel. Let the following random 
sequence denoted as • ■ • , Z_2, Z^i, Zq, Zi, Z2, • ■ • be the ISI 
channel output sequence. Let hQ,hi,--- ,hg denote the ISI 
channel coefficients, here £ is a non-negative integer. The 



LIM et al: RELIABILITY DISTRIBUTIONS OF TRUNCATED MAX-LOG-MAP (MLM) DECODER APPLIED TO ISI CHANNELS 



input-output relationship of the ISI channel is given by the where here Wt denotes the neighborhood of noise samples 
following equation £ 



^h,At 



Wt, 



(1) 



A 



Wt - [Wt 



,Wt 



t-m, yyt-m+1, 



,Wt, 



(6) 



i=0 



and we assume that the noise' samples • ■ • , VK_2, W^i,Wq, Definition 2. Denote the set M that contains the m-truncated 
Wi,W2,- ■ ■ are zero-mean and jointly Gaussian distributed MLM candidate sequences 
(note that we do not assume they are independent). 



Definition 1. The ISI channel state at time t equals the 

(length-i) vector of input symbols [At-e+i, At-e+2, ■ ■ ■ , At]'^. 
The constant i in (1) is termed the ISI channel memory length. 

Figure 1 depicts the time evolution of the ISI channel states. 
The total number of possible states is clearly 2^, which is 
exponential in the memory length £. 



M 



A 



{a e {-1, iy(m+e)+i ,a^ = i for all \i\ > m\ 



(7) 



Each candidate a G A^ has the following form 



— [Ijlj''' J 1, fl-m, fl-m+l, ■ • ■ jflm,!,!,-'' ) 1] 



A. The m-truncated max-log-map (MLM) detector 

We proceed to describe the sliding-window MLM receiver. 
At time instant t, the m-truncated MLM detector considers 
the neighborhood of 2m + £ + 1 channel outputs Zt = 
[Zt-m, Zt-m+1, ■■■ , Zt+m+iV- Define the symbol neighbor- 
hood At containing the following 2(m + ^) + 1 input symbols 



At - \At 



,A 



t-m-£+l, 



,A 



t+iyi+t\ 



(2) 



Both At and Zt are depicted in Figure 2. Let h^ denote the 
following length- (2m + ^ + 1) vector 



MT--- ,0,/io,/ii,--- ,/i£,0,0,-~0]^ 



(3) 



where i can take values |i| < m. Let (D denote an all-zeros 
vector = [0, 0, • • • , 0]^. Let both H and T denote the size 

2m + i* + 1 by 2(m + ^) + 1 matrices given as 



t 



2m+l 



-A 
, A 



H= (&^--- ,{D,h_,„,h_„+i,' 



, h„,{D, 



i= [ Ti , © ,0 ,-••,«, T2 
where the two submatrices Ti and T2 equal 

hi hi^i ■ ■ ■ hi 



(4) 



,T 



2 — 



ho 

he-2 ■ ■ ■ ho 
hi-i ■ ■ ■ hi ho 



2m 

+ 

+ 
1 



A 



Using (4), rewrite Zt = [Zt-m, Zt-m+i,- 
using (1) into the following form 

Zt = (li + T)At-Wt, 



, Zt+m+i] 



(5) 



'To obtain neater expressions in the sequel, the Gaussian noise sample 
Wt in (1) is subtracted. This differs from convention where Wt is typically 
added [1]. Note there is no loss in generality when subtracting, because the 
Gaussian distribution is symmetric about its mean. 



i.e. candidates a G A^ have boundary^ symbols equal to 1. 

An example of a candidate sequence in the set Ai is 
illustrated in Figure 2. The boundary symbols of the candidates 
a £ 7W are fixed, because the boundary symbols of the 
transmitted sequence At are unknown to the detector. The 
start/end states of At (colored black), is shown (see Figure 
2) to be different from the start/end states of the candidate 
a G A^ (colored white). 



Let the following sequence 
denote symbol decisions 

••• ,A-2,A_i,Ao,Ai,A2,--- 

A 



■ ■ , B^2,B^i,Bo, Bi,B2, ■ ■ ■ 
on the channel inputs 
• • • . Let 1 denote the all- 
ones vector 1 = [1, 1, ■ • ■ , 1]-^. In the following let |a| denote 
the Euclidean norm of the vector a. 

Definition 3. The symbol decision Bt on channel input At, is 
obtained by i) computing the sequence B" that achieves the 
following minimum 



SW = arg min \Zt - (H + T)a|2, 
= argmin |Zt -TI -Hap, 

aeM 



(8) 



and ii) setting the symbol decision Bt to the 0-th component 
of B'^' in (8), i.e. set Bt — B\^ where the sequence B" — 



[1,B 



[t\ 



.[t] 



m 1 — m+l ' 



f>lt] rW f>lt] 



i?l^U]' 



The sequence Sl*' in (8), and therefore the symbol decision 
Bt, is obtained by considering the candidate sequences in 
the set Ai, recall Definition 2 and refer to Figure 2. Note 
that B[*1 does not equal the MLM bit detection sequence 
••■ , B^2, B^i, Bo, Bi, B2, ■ ■ ■ ', only the t-th symbol Bt is 
obtained from Sl*'. To obtain Sl*', we compare the squared 
Euclidean distances of each candidate Ha from the received 
neighborhood Zt - TI. 

In addition to computing hard, i.e., { — 1, 1}, symbol deci- 
sions • • • , -B-2, S_i, Bo, Bi, B2, ■ ■ ■ , the m.-truncated MLM 
also computes the symbol reliability sequence, to be denoted 
as ■ ■ ■ ,R-2,R-i,Ro,Ri,R2,- ■ • ■ Consider the following 

^Alternatively, the boundary symbols can be specified to be any sequence 
of choice in the set {—1, 1}^; here we choose the boundary sequence 



fl,l, 



, 1] = 1 simply for clearer exposition. 
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Symbols At 

Channel Zt 
observations 



Boundary 
symbols 



Candidate 
starting state 
fixed to [1, 1]^ 



m-neighborhood 



Boundary 
symbols 




m + i past samples Time t m + i future samples 



(Black) corresp. to At 

(White) corresp. to 
candidate &€ M 



.Candidate ending 
state fixed to [1, 1]^ 



Fig. 2. The m-truncated Max-Log-Map (MLM) detector. Here we illustrate the case m = 6 and £ = 2, where the time evolution of the ISI channel states 
are depicted similarly as before in Figure 1. All 2^ = 4 possible states are shown. Channel states colored black and white, correspond respectively to the 
symbol neighborhood At, and a candidate sequence a in the set A^ (see Definition 2). As shown. At and a may not have the same starting and/or end states. 



log-likelihood approximation (see [8]) 

log ;m / n 1^ -I = log ■ 



Pv{At^Bt\Zt} 



1 



i mm T-^l^t 
aeM 2a^ 



Tl - HaP 



- min -^|Zt-Tl-Ha|2, (9) 



where the first equality assumes^ uniform signal priors , i.e. 

Pr {At = a} = 2-2(™+^)-\ see (2). We also denote ct^ as the 



worst-case noise variance 



ct2 i supEjWt^}. 



(10) 



We assume that a^ is bounded, i.e. a^ < cxo. We want to set the 
(m-truncated MLM) reliability Rt, to equal the log-likelihood 
approximation (9); before formally stating the expression for 
Rt, we first make another definition. Denote the difference in 
the obtained squared Euclidean distances 

A(a,a) = A(a,a;.^t) 

= \Zt - Tl - Hap - \Zt - Tl - Hap, (11) 

where both a and a are arbitrary sequences in 
{-l,l}2(™+^)+i. Recalling (8), we write Rt as follows. 

Definition 4. The non-negative m-truncated MLM reliability 
Rt is defined as 



Rt = min -^A(a,B^'h 



(12) 



where A(a, fil*!) > 0, is the difference in the obtained squared 
Euclidean distances corresponding to candidates a, B^*' G Ai, 
and a"^ is the noise variance (10). 

^The relaxation of this assumption is discussed in the latter-half of 
the upcoming Subsection III-C, where we allow some of the probabilities 
Pr {At = a} to equal zero, i.e. in the case of modulation coding. We also 
comment on non-uniform signal priors in the upcoming Remark 4. 

'*If Wt is stationary, then ct^ = E{Wt}- 



Note that A(a,B[*l) > for all aeM, simply be- 
cause fil*! achieves the minimum squared Euclidean distance 
amongst all candidates in M-, see (8). 

III. Key Observation and Statement of Main 
Result 

This section contains three subsections. In the first sub- 
section, we describe an important key observation; the main 
result of this paper is derived based on this observation. In 
the second subsection, we state the main result and give 
closed-form expressions for i) the joint reliability distribution 
FRt,,Rt2-r--.Rt„{ri,r2, ■ ■ ■ ,r„), and ii) the joint symbol error 
probability Pr{n"=i {Bti 7^ ^i;}}- The result holds for any 
number n of arbitrarily chosen time instants ti,t2,-'' i^n- 
Also, in the second subsection, a Monte-Carlo based procedure 
that evaluates these closed-form expressions is also given. 
In the third subsection, we address two important points 
regarding the given Monte-Carlo procedure, namely i) how 
to efficiently implement this procedure, and ii) how this 
procedure may be modified when one wishes to only consider 
a subset A^ C A^ of the candidates Mi (recall Definition 2). 



A. Key observation 

For all times t, define the following two random variables 

Xt and Yt as 

Xt = max -A(At,a), 
aeM 4 

Yt - max iA(A,,a)>0, (13) 

aeM 4 

ao=At 

where A(j4i,a) is the difference in obtained squared Eu- 
clidean distances, corresponding to the transmitted sequence 
At and a candidate a G M., see (11). Note that the random 
variable Yt satisfies Yt > 0, because there must exist a 
candidate a G 7W that satisfies A{At,a) = 0, see (11); this 
particular candidate a G TM satisfies Ui = At+i for all values 
of i satisfying \i\ < m. 
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Proposition 1 (Key Observation). The m-tmncated MLM 
reliability Rt in (12) satisfies 

Rt - ^\Xt-Ytl (14) 

where both random variables Xt and Yt are given in (13). D 
Proof: Scale (12) by cr2/2 and write 

—- ■ Rt — mm ■ 

2 aeA4 4 4 

MAt,ei) \ , A(^,BM) ^^^^ 

= - max H . (15) 

aeA4 4 I 4 

To obtain the last equality in (15), we used the relationship 
A(Af,a) = -A(a,Ai), see (11). Recall Definition 3 which 
states the symbol decision Bt- Because Bt is either —1 or 1, 
we have either Bt ^ At or Bt — At- Consider the former 
case Bt ^ At, in which (15) reduces to 



A(A,,a) 




where the second equality follows from (13), and the third 
from the fact Rt > 0, see Definition 4. We have thus shown 
(14) for the case Bt ^ At- The same conclusion follows for 
the other case Bt — At in similar manner. ■ 

Note that the expression (14) for Rt in Proposition 1, cannot 
be computed in practice; it is developed purely for analysis 
purposes. This is (14) relies on the ability to compute Xt 
and Yt, which in turn requires knowledge of the transmitted 
sequence At- Clearly, it is absurd to assume that the detector 
knows At- 

Remark 1. From past literature (e.g. [11]), there seems to be 
a misconception that the reliability Rt, must be expressed in 
terms of B^^' (as in (12)). However as shown in Proposition 
1, this is not true. The reliability Rt can be simply written as 
Rt ~ 2/cr^ ■ \Xt — Yt\, where we see from (13) that both Xt 
and Yt depend only on the transmitted sequence At. In other 
words, the reliability Rt can be alternatively computed using 
(14), which does not require any knowledge of B^*'. 

As mentioned before, the key observation Proposition 1 will 
be used to prove the main result. However before going into 
detailed derivations, we would like to first state the main result. 
This will be done in the next subsection; we believe that by 
doing so this will better motivate the significance of this work. 

B. Statement of main result 

For any n number of arbitrarily chosen time instants 
ti,t2,- ■ ■ , i„, we wish to obtain the distribution of the vector 
Rt^, containing the following reliabihties 



R, 



t" 



— [Rti , Rt2 



, Rt 



(16) 



Definition 5. Define the binary vector e^ of size 2{m + £) + 1 
as 

m+£+i 7n-{-£~i 



A 



e, = [0,0,--- ,0,l,0,0,---,0] 



(17) 



where i can take values \i\ < m + £. Further define the matrix 
E of size 2(m + £) + 1 by 2m as 

E = [e_m,e_„i+i,- • ■ ,e_i,ei,e2, • ■ • ,6™]. (18) 

Definition 6. Define the matrix S of size 2m by 2^™ as 

,S22™_i], (19) 



S = [so,si, 



where the columns Sq, Si, • • • , 822™ _]^ make up all 2^™ pos- 
sible, length-{2m) binary vectors, i.e. {so,Si, • • • ,S22m_]^} ~ 
{0,1}2™. 

Let d\a,g{At) denote the diagonal matrix, whose diagonal 
equals the vector At. Recall the size 2m+£+l by 2(?7i+^) + l 
channel matrix H given in (4). Define the matrix G{At) of 
size 2m + £ + 1 by 2'^"' as 



A 



G{At) = Hdiag(At)E. 



(20) 



Recall the noise neighborhood Wt from (6). Let Wt^ denote 
the concatenation 





\Wt, 1 


A 


Wt, 


tr - 






. Wt„ _ 



w 



Definition 7. Define the noise covariance matrix 

£{Wt,Wl} ••• HWt.WlJ 



(21) 



K 



A 

w — 



E{Wt„Wl} ••• E{Wt„Wl} 
= E{Wt^Wl^}. (22) 

Note Kiy is generally not Toeplitz even if Wt is stationary. 
Similarly to (21), let At" denote the concatenation 



(23) 



Let I denote the identity matrix; in particular l2m has size 2m 
by 2m. The matrix SS^ can be verified to have the following 
simple expression 





r^*, 1 


A 


At, 


tf - 






.At„ _ 



ss' 



E„ „T r,2(m-l) [T 



2m 



11-' 



(24) 



fc=0 



A 



where the vector 1 = [1,1,--- ,1]^. Denote the 
matrix Kronecker product using the operation (g). Let 
diag(G(AtJ,G(j4t2),- • • ,G(j4t„)) denote a block 
diagonal matrix, whose block-diagonal entries equal 

GiAt,),G{At,),---,G{AtJ. 
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Definition 8. Let the square matrix Q = Q(At") of size 2mn 
by 2mn satisfy the following two conditions: 

i) the matrix Q decomposes the following size 2mn matrix 

QA2q^= diag (G(A, J, G(A, J, • • • , G(A, Jf Kjv 



a zero-mean Gaussian random vector with covariance matrix 
K. Finally define the following length-n random vectors 



Xt, 



— [Xti,Xt2, • ■ • ,Xt^] , 



A 



diag(G(AtJ,G(A, 



,G{AtJ), 



Yf^ — [Yti , Yt^ 



,Y.J 



(31) 



(25) 



where A = A{At^) on the l.h.s. of (25) is a diagonal 
matrix. The number of positive diagonal elements in the 
matrix A, equals the rank of the matrix on the r.h.s. of 
(25). 
ii) the matrix Q diagonalizes the matrix I„ ® SS^, i.e. the 
matrix Q satisfies 



Q^(I„ ® SS^)Q = I, 
noting that the matrix SS is square of size 2m. 



(26) 



It is shown in Appendix A how to compute such a matrix 
Q = Q(At"), and also obtain the diagonal matrix A = 
A(At'») in (25). We partition the matrix Q into n partitions 
of equal size 2m by 2m,n, i.e., 



Q 



Qi 

Q2 



where both AT^. and Yt^ are given in (13). Let IR denote the set 
of real numbers. We are now ready to state the main result. 

Tlieorem 1. The distribution of X^n —Y^n equals 

(32) 

for all r e R", where the following random vectors and 
matrices appear in (32) 

• U is a standard zero-mean identity-covariance Gaussian 
random vector of length- {2mn). 

• 6{U ^At^) = [<5i, (52, • • • , <5n]"^ is a length-n vector in R", 
where 

S, - 5,([/,Atf) = max(S^Q,A[7 + /x(^J) 

^max(S^Q,AC/ + i/(AtJ). (33) 

['?i,'72,''' ,f]n]'^ is a length-n vector in 



r){U,At^) 
R", where 



(27) 



A 



Let diag(yltj, A(2, • • • , At^) denote the diagonal matrix, 
whose diagonal equals [At-^,At2, ■ ■ ■ ,At^]'^. Define the size 
n by 2mn matrix F(j4t") as^ 



F(At. 



A 



diag{At, ,At^,-' 
GiAt. 



,AtJ(gi'h^Kw 



G(A,J. 



SS^Qi 

SS^Qa 



SS^Q,; 



At, 



r]iU,At^) = dis,g{At,,At„--- ,AtjT 

■{1-1^ -[At,, At,,- ■■, At jf ho 
-\ho\''-l + F(At^)U. (34) 

• Kv(j4t") is the n by n matrix 

KviAf^) = diag{At„At,,--- ,AtJ(g)h^Kw 
■ diag( At J ,At,,--- ,AtJ^ ho 
-F(At^)F{At^)^. (35) 

Refer to (3), (19), (25), (27), (28), (29) and (30) for clarifica- 
tions of the notation used above. D 



(28) The proof of Theorem 1 is given in Subsection IV-A 



where ho is given in (3), and A^ is formed by reciprocating 
only the non-zero diagonal elements of A. Define the following 
length-2^™ vectors fi{At) and t'(At) as 



n{At) =[^i,At2,--- ,M22™-i]^ 
= [G{At)Sf -Til -At) 
- [\G{At)so\MG{At)s,\^ 

I/(At) ^[Ui,l^2,--- ■,V2'^^-lV 



|G(At)s2 



A 



lx{At) - 2At ■ hlG{At)S, 



(29) 



(30) 



where jik — iik{At) and Vk — Vk{At) denote the fc-th 
components of Hk{At) and Vk{At) respectively, and T is 
given in (4). Let <l>K(r) denote the distribution function of 

'The matrix appearing in (28), with elements G(At ), can also be written 

as diag (G(Ati ), G{At,), ■■■ , G(At„ )). 



Both i) the joint distribution of the reliabilities i?t" = 

[i?(j , i?(j , • • • , Rt„]'^ in (16), and ii) the joint error probability 
Pr{nr=i {^ti 7^ ^ti}}^ follow as corollaries from our main 
result Theorem 1. In the following we denote an index 

subset {ti,T2,--- ,Tj} C {ti,t2,--- ,in} of size j, written 
compactly in vector form as t{ — [ti, T2, • • • , Tj]^. 

Corollary 1. The distribution of Rf^ — 2/a^ ■ |Xt" — Yf^], 
see Proposition 1, is given as 

n / 2 

i=0{Ti,r2,---,T,}C ^ 

where the length-n vector a(T'J,r) = [ai,a2,- ■ ■ ,an]'^ 
satisfies 



ai = ai(rj,rj) = 



-Tt ifU e {ti,T2,--- ,Tj}, 

Vi otherwise , 
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Procedure 1: Evaluating the Joint Distribution 

^x.„-y..(r) 

Initialize: Set Fx „-y „ (r) := for all r e R"; 
1 while Fx. „ -Y^n (r) not converged do 

Sample A^^ = a" using PrjAt" = a"|; Sample the 

length-n, standard zero-mean identity-covariance 

Gaussian vector E/ = u; 

Using the sampled realization At" = a", obtain the 

matrices Q = Q(a") and A = A (a") satisfying 

Definition 8, see Appendix A; 

Compute 5i ~ (5i(u, a") for all i £ {1, 2, • • • , n}. For 

5i compute 

max slQiAu + ^k (a) , 

max slQiKu + vda), 

fce{o,i,---,22".-i} 

see (33). Here a is the sampled realization A^. — a, 

and both iik{si) and J^fc(a) are the fc-th components 

of /x(a) and t'(a), see (29) and (30); 

Compute F(At") in (28); Also compute i7(u, a") in 

(34) and Ky(a^) in (35); 

Update 

:= Fx,^-Y,^ (r) + $Kv(a7) (r + <^(u, a^) - j,(u, a.^ 



where the probability 



Fx ^-r ^(0)=Pr 



n 



{Xr - 1; < 0} 



^Te{Ti,T2,--- ,Tj} 



/zfli f/ze similar closed form as in Theorem 1. D 

Proof: From (13) we clearly see that the event {Xt >Yt] 
indicates that the sequence B^*! in (8) will have its 0-th 
component B^ ^ At- Because the symbol decision Bt is 
set to Bt = -Bq , see Definition 3, the event {Xt >Yt} 
indicates that Bt ^ At, which is exactly a symbol decision 
error occurring at time t. ■ 



for all r £ 



8 end 



and Fx^n -Ytn { \- ■ oii^i i r) ) has the similar closed form as 
in Theorem 1. D 

Corollary 1 can be verified using recursion; for the n-th 
case we express 



-F|v _,_y l^x.„-y,„(rr\-'^n)- 

Observe that we still may apply Corollary 1 to each of 
the two terms on the r.h.s.; we apply Corollary 1 only to 
the variables \X^n-i —Y^ri-i\, at the same time accounting 
for the (respective) joint events {Xt^ — Yt^ < r„} and 
{Xt^ — Yt^ < —Tn]. The desired expression will be obtained 
after using some algebraic manipulations. 

Corollary 2. The probability Prjfl-Li {Bu ¥" ^t,}} that all 
symbol decisions Bt^ , Bt^, • ■ • , Bt^ are in error, equals 

n 

j = l {ti,T2,--- ,Tj}<Z 



' 1 '1 



Denote the realizations of At", At and U, as At" = a", 

and At = a, and U — u. The Monte-Carlo procedure used 

to evaluate the closed-form of Fx^^^Y^n (r) in Theorem 1, is 

1 1 
given in Procedure 1. The following Remarks 2-5 pertain to 

Procedure 1. 

Remark 2. We may reduce the number of computations used 
to the obtain matrices Q — Q(At") and A = A(At") in Line 
3, by sampling U ~ u multiple times for a fixed At" = a". 

Remark 3. The matrix Kv(a") computed in Line 5 (also 
see (35)) may not have full rank. Hence when evaluating the 
Gaussian distribution function <I'Kv(a")(r) with covariance 
\\ matrix Kv'(a") in Line 6, we may require techniques designed 
for rank deficient covariances, see for example [13]. 

Remark 4. Our proposed method requires no assumptions on 
the noise covariance matrix K.w '« (22), and can be applied 
even when the noise Wt is correlated and/or non-stationary. 
Also at the end of this subsection, we present a modification of 
the previous Procedure 1, which addresses certain cases where 
we do not want to consider all candidates in Ad (see Definition 
2), i.e. PrjAt" = a"} ~ for some a". This particular 
situation arises, for example, when we have a modulation code 
(see [14], [15]) present in the system. 

Here we always assume that At" is equally -likely amongst 
all its realizations At" = a". Further modifications will be 
required to extend our method to the general case of non- 
uniform priors Pr jAt" = a"| (the first equality of (9) is not 
valid for such cases). 

Remark 5. Because we have that 

< $Kv(A,„) (r + 5(f/, Atj) - v(U,At^)) < 1, 

the well-known Hoeffding probability inequalities can be ap- 
plied to obtain convergence guarantees, see [16]. 

The main thrust of the next subsection is to address Line 4 
of Procedure 1 . It appears that to execute Line 4 of Procedure 
1, we require an exhaustive search over an exponential 2^™ 
number of terms, in order to perform the two maximizations. 
However, we point out in the next subsection, that these 
maximizations can be performed more efficiently by utilizing 
dynamic programming optimization techniques. Also in the 
next subsection, we address the computation of Fx „-y „ (r), 
in instances where one wishes to only consider a subset 
M C M of the candidates M (see Definition 2). 
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Time t - 1 Time t 



State 

[St-1,St-1+1,- ■■ ,St-1 




State 

{St-I+1, St-1+2, ■■■ , StY 



Reward C 



I / ^ llj ■ dr—j ' Sr—j I 



Fig. 3. Time evolution of the dynamic programming states. 



C. On computing the closed-form of Fx^n -Y^n (j) using 
Procedure 1 

To compute 5i in (33) while executing Line 4 of Procedure 
1, we need to perform the following two maximizations 

max s^Q.Au + [G(a)s]'^ • T(l - a) - |G(a)sp, 
se{o,i}2™ 



max s^QiAu + [G(a)s]^ • [T(l - a) - 2ao • ho] 
se{o,i}2™ 



- |G(a)sp, 



(36) 



where both a and u are realizations Aj . = a and U ~ vl. Note 
that we obtain (36) from (33), by substituting for both /i(a) 
and i/(a) using (29) and (30) respectively. Index the realization 
At- = a similarly as in Definition 2 

^ r }T 

Let diag(a) denote the diagonal matrix, with diagonal a. 

The matrix G(a) appearing in both maximization problems 
(36), has a distinctive structure. We now proceed to clarify 
this structure. 

Definition 9. Let g,- denote the length 2{m-\- 1) -\-\ vector 



m+T 



m+z — T 



[0, 0, • • • , 0, hiar-i, /i£-ia^_(£_i), • • • , hoar, 0, 0, • • • , O] , 
where t can take values r G {— m-, — , m + 1, • • • , ?Ti + ^}. 
Using the 2m + ^ + 1 vectors g,-, we rewrite G(a) as 



G(a) 



A 



Hdiag(a)E 



T 
g-m+1 






E, 



(37) 



Procedure 2: Solving max s^C — |G(a)s|^ using 

Dynamic Programming 
Convention: Set Cq :~ — oo and also set values Cj := 
for all \j\ > m; 
: Denote the length-i? binary vector by 

s = [se-i,se^2,- ■ ■ ,so]^; 
Input: Matrix G(a); Vector of constants 

Output: Value stored in (3jn+e{s) — /3„i+£(0); 
Initialize: For all s £ {0, 1}^, set the values 



/3. 



(s) := { 



m-lK'=) • — 



if s = 0, 
— oc otherwise 



1 forall the t g {— m, —to + 1, • • • , m + ^ } do 



5 

6 end 



^£-1 



forall the § e {0, 1}^ do 

Set the value a = a{s) := X]j=o ^j^t-jSj- Set 
the states §0 and §1 as 

So := [0,S£-i,- • • ,S2,si]^, . 
§1 := [l,s<?-i, • • • ,S2,Si]^- 
Compute /?t(s) '■— max{— a^ + /3r-i{so),Cr- 
[hear-i + a]'^ + I3r^i{si)}; 
end 



where j satisfies | j | < m+l. Both problems (36) are optimized 
over all s G {0, 1}^™; we index 

It is clear that by using (38), the following is true for all 
vectors g^ given in Definition 9 

m+e 



grEs 



E ( 



Sr^jj-Sj 



j=0 



• Sr- 



(39) 



if we set so = and Sr = for all \t\ > to. 

Define the length-(2?7i) vector C = [C-m,C-m+i,' ■ ■ , 
C_i,Ci,C2,--- ,Cmf. Set Co := -00 and C^ := for all 
\t\ > m. By setting 



Q,Au + [G(a)]^ • T(l - a) 



and 



C := Q,Au+[G(a)f •[T(l-a)-2ao-ho], 



recall the definition of G(a) from (20). From the observed 

structure of g,- it can be clearly seen from (37) that G(a) 

is a sparse matrix with many zero entries. The matrix G(a) respectively, we can solve both problems (36) as 

is an (i + l)-banded matrix, see [17], p. 16. As it is well- T/. \r^/ \ a 

^ ' > ' L J, F maxgg 10 i|2m s C— |G(a)s| 

known in the literature on ISI channels, it is efficient to employ 

c/ynamic programming techniques to solve both problems (36), _ v^ ^ / r-p \2 

by exploiting this (£ + l)-banded sparsity [2]. ~ seioaP" -^ ^ ^^ ^*^ ' ' 



(40) 



It is clear that the inner product g:^ e^ extracts the j-\h 



component of the vector g^ , i.e. 



Sr ^r-j — 



hj ■ flr-j if < j < i?, 

otherwise , 



where the r-th term g^Es = X^i^o hjar-jSr-j-Fov the sake 
of completeness, we shall state the dynamic programming 
ng-) procedure that solves (40). 
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Definition 10. The dynamic programming state at 

time T equals the length-£ vector of binary symbols 

[Sr-l+l,Sr-e+2, ' ' ' , Sr]^ G {0, 1}^. 

For the benefit of readers knowledgeable in dynamic pro- 
gramming techniques, we illustrate the time evolution of the 
dynamic programming states in Figure 3. Dynamic programs 
can be solved with complexity that is linear in the state 
size [2]; in our case we have 2^ states. The dynamic pro- 
gramming procedure optimizing (40) is given in Procedure 2. 

The second part of this subsection addresses the following 
separate issue. Recall from Remark 4 that Theorem 1 requires 
no assumptions on the distribution Pr |j4tj = a"}. In other 
words, the distribution Pr {At = a} for each time t can be 
arbitrary specified. One may particularly want to consider 
certain cases, where some of the probabilities Pr {At = a} 
equal 0; one example of such a case is where a modulation 
code is present in the system [14], [15]. In these cases we 
would not want to consider candidates in the set Ai (see 
Definition 2) that have zero probability of occurrence. We 
would consider the subset M C Ai, explicitly written as 

M=Mt = UeM:Pv} fl {At+, = «,} i = I (41) 

for each time instant t. 

If we consider the subsets A4 C A^,then Procedure 1 has 
to be modified. The modification of Procedure 1 is given as 
Procedure 3; this modification will be justified in the upcoming 
Section IV). 

Remark 6. Line 4 of Procedure 3 may also be efficiently 
solved using dynamic programming techniques. 

Thus far, we have completed the statement of our main 
result Theorem 1 and the two main Corollaries 1 and 2. We 
have given Procedures 1-3 (also see Appendix A), used to 
efficiently evaluate the given closed-form expressions. The rest 
of this paper is organized as follows. In the following Section 
IV, we shall prove the correctness of both Theorem 1, and also 
Procedure 3. A simple upper bound on the rank of Ky (.At" ) in 
(35) will also be given. In Section V, numerical computations 
will be presented for various commonly-cited ISI channels 
in magnetic recording literature [18]. The computations are 
performed for various scenarios, so that we may demonstrate 
a range of applications of our results. We conclude in Section 
VI. 

IV. Distribution OFXt^^ - Ftj and reliability 

i?t^=2/a2.'|Xt;. -Ft? 
A. Proof of Theorem 1 



Procedure 3: Evaluating Fx-,„-y,„ (r), for candidate sub- 
sets M C M, see (41) 
Initialize: Set Fx:,„-y,„ (r) := for all r s R"; 

1 while Fx^r, -y^,. (r) not converged do 

2 Perform Lines 2-3 of Procedure 1 ; 

3 Compute 5i = (5i(u, a") for all i G {1, 2, • • • , n} by 
computing 

max SfeQiAu + ,Ufc(a), 

k: a(Esfc,a)eA1t. 

max _ s^QjAu + i//j;(a), 

fe: a(Esi.+eo,a)eA4t. 

see (33), where /iA;(a) and i^fe(a) denote the fc-th 
components of /x(a) and i/(a), see (29) and (30). 
Both E and eg are given in Definition 5. Also, the 

vector a(e, a) = [«_„_<?, «-„-(<?-!),• • • ,am+iY 
satisfies 



a, ^aj{ej,aj) 



Perform Lines 5-6 of Procedure 1; 



if e^ = 1, 
if ej = 0. 



5 end 



(Gaussian) noise samples. To improve clarity, we shall intro- 
duce the following new notation, both used only in this section 



9{At) = At ■ [T{l-At)f ho -\ho 



A 



r = r(Atr) = diag(G(A,J,G(A,J,.-- ,G(A,J). (43) 

Recall that I„ denotes a size n identity matrix, and that (g) 
denotes the matrix Kronecker product. Using (43), we may 
now more compactly write 



F(Ar) 

viU,At^) 



r' KwT, 

dia.g{At, ,At,,--- ,AtJ(g) h^KwT 
•[I„®SS^]-QAt, 

[0(AtJ,0{At,),--- ,e{Atjf + F{At^)U, 

(44) 



where (recall that) matrices Q = Q {At^ ) and A = A {Af^ ) are 
given in Definition 8, matrix F(Atn) in (28), and ri{U,At^) 
in (34). 

Proposition 2. The random variables Xt and Yt in (13) can 
be written as 



Xt 



We begin by showing the correctness of Theorem 1 , which ^^* \[ ( t) \ t + [ t) + [ t + [ t)\ ) 

was stated in the previous section. Define the random variable Yt = max ([G(74t)S] Wt+n{At)), 



Vt 



A 



AfKWt 



(42) . 

where e{At) = At ■ [T{t - At)]'^ ho - |hoP as given in (43). 



A 



It is easy to verify that Vt is Gaussian: recall that Wt — 
[Wt^M,Wt-^M+i,--- ,Wt+M+iV is the neighborhood of 



D 



Proof: We expand A(j4i,a) in (11) by substituting for 
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Zt using (5) to get 

A(A„a) 
= \Zt - Tl - HAtp - \Zt - Tl - Hap 

= |-W^t + T(Ai-l)p 

- I - Wt + T(^ - 1) + H(^ - a)|2 
= -2[-Wt + T{At - 1)]^H(A - a) - |H(^ - a)p. 

(45) 

We substitute (45) into the definition of Xt and Yt in (13) to 
obtain 

Xt = niax[W^,+T(l-A,)]^f^-H(A, -a; 

1 ' 

2-H(A,-a] 



Ft = max[irt + T(l-A,)]M -•H(A, -a; 



o.a=At 
1 



H(^t - a] 



(46) 



Using (17) and Definitions 2, 5 and 6, we establish the 
following equality of sets 



(At-a) : ae M,ao ^ At 



= {diag(AOEsj + ^t • eo : < J < 2^" - l} , 



-(At -a) : ae A^,ao = At 

= {diag(At)Es, : < J < 2^" - l} . (47) 

Next, we utilize both (46) and (20) to rewrite (45) as 

Xt = max [Wt + T{l-At)f[G{At)sj+AtM 

-|G(At)s,+AthoP, 
Ft = ^^^ max ^_^^[W^t + T(l - At)f[G{At)s,] 



\G{At)s, 



(48) 



By the definition of /i(At) in (29) and S in Definition 6, the 
expression for Yt in the proposition statement follows from 
(48). For Xt, we continue to expand (48) to get 



Xt 



I'iAt) 



max ( [G(At)^fWt+lJi{At) - 2At ■ \i^G(At)S 

At ■ h^Wt -1 + {At[T{l - At)]^\io - |ho|'} • 1 



Vt 



9 (At 



in the same form as in the proposition statement, where t'(At) 
is defined in (30), and Vt in (42), and d{At) in (43). ■ 

Recall Q = Q(At") and A ~ A (At") from Definition 8. 
To prove Theorem 1 we require the following lemma. 

Lemma 1. Let U denote a standard zero-mean identity- 
covariance Gaussian random vector of length- {2mn). Recall 



VFt" in (21). The following transformation of random vectors 
holds 



S^Q2(At.^ 

S QniAf^ 

"G(AtJS 



A{At^)U 



G(AtJS 



Wt, 



Wt 



or more concisely we equivalently write 

(I„0S^)-Q(At^)A(Atj)[7 

= (I„®S^)-r(Atj)^H^t- 



(49) 



(50) 

D 



using Q(At") in (27) andT{At^) in (43). 

Proof: After conditioning on At", both vectors that 
appear on either side of (50), are seen to be zero mean 
Gaussian random vectors (recall that Wt is zero mean). 
Therefore to prove the lemma, we only need to verify that 
after conditioned on At", both l.h.s. and r.h.s. of (50) have the 
same covariance matrix. This is easily done by using property 
i) of Q = Q(At") in Definition 8, which yields 

E{QAC/t/^AQ^|Atj} = Q(Atr)A(At^)2Q(At^)^ 

= r(Atj)^KM.r(Atj). 



We are now ready to prove Theorem 1 . The proof is split 
up into the following two seperate cases : 

. rank[r(At5^)^Kivr(Atj)] = 2mn, and 
• rank[r(Atn)"^Kvi^r(At")] < 2?7in for some realization 
At^=a^' 

We begin with the first case. 

Proof of Theorem 1 when rank(r(At")"^/Cwr(At")) = 
2mn: 

We first derive the following equalities 

(AtQ^)(I„®SS^)r(Atj)^W^tr 

= (At Q^ )(!„«) SS^) QA£/ 
^A^^AU^U. (51) 

The first two equalities follow by respectively applying 
properties i) and ii) of the matrix Q = Q(At"). The 
last equality holds because by virtue of the assumption 
rank(r(At")^Kvyr(At")) — 2mn, in which then At is 

strictly an inverse of A. Recall both Vt • = At- ■ h^Wti ^^^d 
Vt^ = [Vt„Vt,,--- ,VtJ'^. Taking (51) together with (42), 
we have the following transformation 



u 



diag( Ati,At,,--- ,At„)®h;^ 
(AtQ^)(I„®SS^)r(Atj)^ 



Consider the conditional event 



{Xt^-ytj<r|Atj,t7} 



W^t". (52) 



(53) 
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where r = [?'i,?'2,--- T^nV G R"- It is clear from both 
Proposition 2 and (52), that after conditioning on both At^ 
and U in (53), the only quantity that remains random in (53) 
is the Gaussian vector Vt^. Using Lemma 1, we have the 
transformation 

S^Qi(At^)A(At.)f/ = [G{Au)^fWu, 

therefore we may rewrite both Xt^ and Ft . from Proposition 

2 as 

Xu = max(S^Q,A[/ + i/(AtJ)+T4. +0(AtJ, 
Yt^ = max (S^Q,A[/ + /x(At J) . (54) 

The event (53) can then be written as 

{Xtj-Ft7<r|At^,[7}== fl {Xu<n+Yu\Atr^,U] 



l<i<n 



l<i<n 
l<i<ri 



+Vu+e{Au) 



<r^+ Yu 



At^,U 



Vu + ^ fn + max [S^Q.AU + n{At, )] 
e{At^)- \ - max [S^Q,A[/ + i/(At J] 



Atr^M 



(55) 



Continuing from (55), we utilize (33) to rewrite 



{X,.-Y,.<r\A,.,U} 
= n {Vu+d{Au)<n + S,{U,At^)\At^,U}.(56) 

l<i<n 

We now determine both the mean and variance of Vt" , after 
conditioning on both At" and U. From (52), we derive the 
formula 

E{Vt^U^\At^} = diag{At„At,,--- ,AtJ^h^Kw 
•r(At^)(I„0SS^)QAt 

= F(At.), (57) 

where F{At^) is given in (28) . Next, we compute the 
conditional mean 

E{Vt^\At^,U} = E{Vt^\A,^} + E{Vt^U^\At^}U 
= F(At^)[/, (58) 

where the second equality follows from E{yt"|At"} = 
(because Wt^ has zero mean, see (42)), and substituting 
(57). The conditional covariance matrix Cov { Vt" \At^ , f/} is 
obtained as follows 

Cow{Vt^\At^,U} 

= E{Vt^V^^\At^} - E{Vt^U^\At^} ■ E{UV{^\At^} 
= di&giAt, ,At,,--- ,AtJ(g) h^Kw 

■ dmg{At,,At,, ■■■ ,AtJ(E) ho- F(At^)F{At^f 



where Ky(j4tn) is given in (35). The expression for 
Fx n Y n (r) in Theorem 1 now follows easily from (56) 

{X,.-Y,.<v\A,r.,U] 

= [v,^^^[e(At,),e{At,),---,e{At^)f 

<r + <5(£7,Atf)|Atr,£/} 
and noticing that the random vector 

v,^^ +[e(At,),e(At,),- ■ ■ ,e{At^)f (60) 

is (conditionally on At" and U) Gaussian distributed with 
distribution function 

<^Kv(A,,.){r-r}{U,Atr^)), 

where both the conditional mean and covariance r}{U,Atn) 
and Kv(At'»), are given respectively in (58) and (59). ■ 

Next we consider the other case where the rank of 
r(At5')^KH'r(At5') < 2mn for some value of At^' — a". In 
this case, the arguments of the preceding proof fail in equation 
(51), where the final equality does not hold because then At 
. is strictly not the inverse of A. However as we soon shall see, 
the expression for Fx „ -y „ (r) in Theorem 1 still holds for 
this case. 

Proof of Theorem 1 when rank(r(At")^/CTyr(At")) < 
2mn for some At" = a".- 

Recall that the matrix [A(At")]t = A^ is formed by only 
reciprocating the non-zero diagonal elements of A (At") = 
A. For a particular realization At" — a", let the value 
j = rank(r(At")"^Kvi'r(At")) equal the rank of the matrix 
r(At")^KH'r(At"). Consider what happens if j < 2mn. 
Without loss of generality, assume that all non-zero diagonal 
elements of A(At") = A, are located at the first j < 2mn 
diagonal elements of A. Define the following size-j quantities 

• the random vector U{ = [Ui, U2, • • • , Uj]"^, a truncated 
version of U = [C/i, C/2, • • • , U2mnV ■ 

• the size 2mn by j matrix Q, containing the first j 
columns of the Q, see Definition 8. 

• the size j diagonal square matrix A, containing the j 
positive diagonal elements of A, also see Definition 8. 

If we substitute the new quantities U{, Q and A for U, Q and 
A in equation (51), it is clear that (51) holds true, i.e., 

{A^Q'^){In<E>SS'^)riAt^fWt^ 



= (A'^Q'^)(I„®SS^)QAC/i 



(61) 



A 



= Kv(Atj), 



(59) 



where note from Definition 8 that it must be true that Q"^(I„ ® 
SS"^)Q — Ij, here Ij is the size j identity matrix. Hence, 
Theorem 1 clearly holds when we substitute U{, Q and A for 
U, Q and A 

Further, we can verify the following facts: 

• QiAiU{ = QiAU, and therefore 

. SiUi,At^)=6iU,At^).Also, 

• F(At") remains unaltered whether we use Q, A or Q, A, 
therefore 

. 7,(£7{,At^)=7,(£7,Atj). Also, 
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• Ky(At") remains unaltered whether we use Q,A or 
Q,A. 
Thus we conclude that 

must hold, and thus Theorem 1 must be true even when 
rank[r(At")"^Kvi^r(At")] < Smn for certain values of 
Atf =a«/ ' ■ 

We have thus far completed our proof of Theorem 1; we 
next show an upper bound for the rank of the matrix Ky(j4t") 
in (59). We point out that Ky(At") sometimes may even have 
rank 0, i.e. Ky(j4t") equals the zero matrix. 

B. Other comments 

The following proposition states that the rank of Ky(At") 
depends on both the chosen time instants {ii, ^2, • • • , in}, and 
the MLM truncation length m. The following proposition gives 
the upper bound on rank(Kv(At")). 

Proposition 3. The rank of Kv(At") equals at most the 
number of time instants t £ {ti,t2,--- ,tn\, that satisfy 
\t-t'\ >mfor allt' G {hM.--- .in}\{t}. U 

Proposition 3 is proved using the following lemma. 

Lemma 2. If two time instants ti and ^2 satisfy \ti — 
^2! < "T-, then observation of [G{At-i)SY'W ti uniquely 
determines Vt^ = At^ ■ hg VF^^ (and vice versa observation 
of [G{At2)S]'^Wt2 uniquely determines Vt^ — At^ ■ h^VFtJ. 

D 



TABLE I 
Various ISI channels in magnetic recording [18] 



Vu 



Proof: Recall that Vt^ equals 

^ At2 ■ h^Wt^ = At2 ■ {hfiWt^ 



hrWt 



iVVto+I 



) 



If the condition |ti— 12| < m is satisfied, then VFt^ , • • • , Wt2+i 
is alength-(/+l) subsequence of VFtj = [Wti-„i,Wti^„i+i, 
■ ■ ■ , Wti+m+i]'^ ■ From the definition of S (see Definition 6) 
and because \ti — t2| < fn, then the matrix S must have a 
column s that satisfies Es = e^j-ti, see Definition 5 for E 
and its columns e^. Then for this particular column s we have 

[G{At,)sfWt, = [Hdiag(AtjEs]^W^t, 

= At, ■ [Het2^t,fWt, 

= At2-h^Wt2=Vt2, 

where the second equality holds because s satisfies 
diag(AtjEs = diag(AiJef2-ti = At, ■ et^-ti, and also 



[Het,^t,fWt, 



= [H.et,-ti] [Wt^-m, Wt^-m+l, ■■■ , Wt^+m+l] 

= hoWt, + hiWt,+i + ■■■ + hiWt,+i. 

By symmetry, the same argument holds for [G{At^)S]'^Wt2 
andVt2^At,-hoWt,. ■ 

Proof of Proposition 3: Recall from (59) that 
Ky(At") = Cov{Vt" |j4t" , E/} is the (conditional) covari- 
ance matrix of Vt^. After conditioning on U, the vector 



Channel 


Coelticients 
ho hi /i2 


Memory 
Length e 


PRl 


1 1 


1 


Dicode 


1 -1 


1 


PR2 


1 '2 1 


2 


PR4 


1 -1 


2 



QiAU = [G{At-)S]'^Wti is uniquely determined, see Lemma 
I. Furthermore by Lemma 2, if QiAU = [G{AtJS]'^Wt, is 
uniquely determined then Vt- — At- ■ h^Wt is determined 
whenever \ti — tj\ < m. Thus we conclude that the only 
variables Vt. that may contribute to the rank of Kv(At"), 
must be those with corresponding U that are separated from 
all other {ti, i2, • • • , in} \ {^i} by greater than m. ■ 

Remark 7. From the expression for Fx^n-Y^n{^) in Theo- 
rem 1, the distribution function Fx^n-Y^ni^) must be left- 
continuous [19], if the rank(Kv'(j4tn)) = n. 

We conclude this section by verifying the correctness of 
Procedure 3, used to evaluate Fx^.-n-v^^i'^) when candidate 
subsets M. G M (see (41)) are considered. The only differ- 
ence between Procedures 1 and 3, is that Line 3 of Procedure 3 
replaces Line 4 of Procedure 1 . First verify that the following 
equality of sets is true 

{aGTWt. : ao 7^ ^t.} 

= {a(Esfe + eo,^t.)e7Wt, : < fc < 2^" - l} , 
{aG Mt, ■■ ao = At.} 

= {a(Esfe,At.) e Alt, : < fc < 22" - 1} , (62) 

where here the function a{e,At^) is given in Line 3 of 
Procedure 3. Next perform the following verifications in the 
order presented: 

• Replace AA by Alt. in the definitions of Rt^ in (12). 
Replace M by Mt, in both Xt. and Yt^ in (13). The 
validity of Proposition 1 remains unaffected. 

• Replace A^ by A^ti in the proof of Proposition 2. The 
change first affects the proof starting from (46), and 
(47) needs to be slightly modified using (62). The new 
Proposition 2 finally reads 

Xt. = max _ sl[G{At^)fWt, 

+ Vk{At,) + Vt^+9{At,), 
Yu^ max sl[G{At^)fWt^+tik{At,). 

k: a{E,Sk,At.)eMn 

• Utilize the new Proposition 2 in the proof of Theorem 
1. The change first affects the proof starting from (54). 
Proceeding from (55)-(56) we arrive at the new formulas 



5^ 



S^{U,At^ 



= max Sf. QiAU + Vk [At^ ) 

k: a{E,Sk+eo,Ati)eMti 

— max s^QiAU + ^kiAti). 

k: a{Esk,At.)eMti 

This is exactly the way Si is computed in Procedure 3, 
Line 3. 
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■ Sl-»- a 



SNR 10 dB 



non- truncated 
"MLM (see [5]) 



— 1 I 




Fig. 4. Marginal reliability distribution i^jfj-y^ (0-^/2 • r) computed for the PRl channel (see Table I). Truncation lengths m are varied from 1 to 5. At 
SNR 3 dB, all curves are seen to be extremely close, with the exception of m = 1. At SNR 3 dB and choice of m = 2, the computed distribution appears 
close to the simulated distribution. Hence, m = 2 seems to be a good choice. At SNR 10 dB, a good choice appears to be m = 5. 



This concludes our verification of Procedure 3. 

V. Numerical Computations 

We now present numerical computations performed for 
various ISI channels. To demonstrate the generality of our 
results, various cases will be considered. Both i) the reliability 
distribution Fr^^ (r) and ii) the symbol error probability 

Pi"{nr=i {^U 7^ ^ti}} *ill ^^ graphically displayed in the 
following manner. Recall from Corollaries 1 and 2 that we 



y^„|(a /2 • r) (here a denotes 



have FR^„(r) == F\x^ 

the noise variance in (10)) and Pr{n"=i{-Sti T^^t;}} = 

Pr {Xt" > y t" } ■ Therefore, both quantities i) and ii) will be 

displayed utilizing a single graphical plot of Fx,.n-Y^n (o'^/2- 

r). 

The chosen ISI channels for our tests are given in Table I; 
these are commonly-cited channels in the magnetic record- 
ing literature [18], [15]. Define the signal-to-noise (SNR) 
ratio as 101og]^Q(^^^Q hf/a^). The input symbol distribution 
Pr {At = a} will always be uniform, i.e. Pr {At = a} = 
2-2(m+£)-i ggg ^2)^ unless stated otherwise. 
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Fig. 5. Comparing the distributions Fxt-Yt (''"^/2-»') across different SNRs, 
for a fixed truncation length m = 5. The channel is the PRl channel, see 
Table I. The probability mass shifts to the left as SNR increases, which is 
expected. 



A. Marginal distribution Fxt-vA'^'^ /'^ ' t) when the noise is 
i.i.d. 

First, consider the case where the noise samples Wt are i.i.d, 
thus a^ — EjM^^}. Figure 4 shows the marginal distribution 
Fxt-Yti'^'^ /"i- ■ r) computed for the PRl channel (see Table 
I) with memory £ ~ 1. The distribution is shown for various 
truncation lengths to = 1 to 5, and two different SNRs : 3 dB 
and 10 dB. At SNR 3 dB, we observe that with the exception 
of TO. = 1, all curves appear to be extremely close. At SNR 
3 dB, a good choice for the truncation length to, appears to 
be TO, = 2; the computed distribution for m = 2 appears 
close to the simulated distribution. At SNR 10 dB, it appears 
that TO, = 5 is a good choice. The probability of symbol error 
Pv{Bt ^ At} = Pv{Xt > Yt} = I^Fx,-yM is observed to 
decrease as the truncation length to, increases; this is expected. 
At SNR 3 dB, the (error) probability Pv{Xt>Yt} = 1 - 
Fxt-Yti(^) ~ 1-4 X 10^^ for truncation lengths m > 1. For 



SNR 10 dB, the (error) probability Pv {Xt > Yt} is seen to 
vary significantly for both truncation lengths m = 1 and 5; 
the probabihty Pr{Xt > Yt} « 1.1 x 10"^ and 1 x 10"^ ^^ 
m = 1 and 5, respectively. 

For the PRl channel and a fixed truncation length m ~ 4, 
the marginal distributions Fxj_yj(cr^/2 • r) are compared 
across various SNRs in Figure 5. As SNR increases, the 
distributions Fxj_yj((T^/2 • r) appear to concentrate more 
probability mass over negative values of Xt — Yt. This 
is intuitively expected, because as the SNR increases, the 
symbol error probability Pi {Bt ^ At} = Pi{Xt >Yt} — 
1 — Fxt-Yti^) should decrease. From Figure 5, the (error) 
probabilities Pr {Xt >Yt} are found to be approximately 
1.2 X IQ-i, 8 X 10-2^ 3 X 10-2, and 1 x 10"^ respectively for 
SNRs 3 to 10 dB. 
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\ti-t2\ = l<m = 2 



\t-i_-t2\ = 7>2(m + £) = 6 
0,87 ;■■• 




0.905 : 



1*1 -*2| =7>m = 5 



0.905 




Fig. 6. Joint reliability distribution Fx ^—Y 2 ("''^/^ ■ i") computed for both the PRl and PR2 channels, with chosen truncation lengths m = 2 and 5. 



B. Joint distribution Fx 2-Y 
the noise is i.i.d. 



, (c"/2 • r), here n = 2, when 



the correlation coefficient E{Wt ■ Wt+i} j cP' 



-0.5. 



We consider again i.i.d noise Wt, and the PRl and PR2 
channels (see Table 1). Here, we choose the SNR to be 
moderate at 5 dB. For the PRl channel with memory length 
£ = 1, the truncation length is fixed to be m = 2. For the 
PR2 channel with £ = 2, we fix m = 5. Figure 6 compares 
the joint distributions Fx 2-y 2 ("'^/^ • r), computed for both 
PRl and PR2 channels and for both time lags \t\ — ^2! — 1 
(i.e. neighboring symbols) and \t\ — i2| =7. The difference 
between the two cases |ti — ^2! — 1 and 7 is subtle (but 
nevertheless inherent) as observed from the differently labeled 
points in the figure. For the PRl channel, the joint symbol error 
probability Vx{Bt, ^ At, , Bt, ^ At,} = Pr 1x^2 >Y^2} 
is approximately 6 x 10~^ and 2 x 10~^ for both cases 
ii — ^2! = 1 and 7, respectively. Similarly for the PR2, the 
(error) probability is approximately 3 x 10^^ and 1 x 10^^ 
for both respective cases |ii — ^2! ~ 1 and 7. Finally note that 
for the PRl channel when |ti - ^2! = 7, both MLM reUabihty 
values Rt, = 2/a'^ ■ \Xt,-Yt,\ and Rt, = 2 j a"^ ■\Xt2-Yt2\ are 
independent; this is because then 1^1 — ^2! = 7 > 2(rn-|-£) = 6, 
refer to Figure 2. 

C. Marginal distribution i^Xt-Yi(o'^/2 • r) when the noise is 
correlated. 

Consider the PR2 channel, and now consider the case where 
the noise samples Wt are correlated. For simplicity of argu- 
ment we consider single lag correlation, i.e. E \Wt ■ M^f } = 
for all \t ~t\ > 1, and consider the following two cases : 

• the correlation coefficient E {Wt ■ Wt+i} j g^ = 0.5, and 



We consider a moderate SNR of 5 dB. Figure 7 shows the 
distributions i^Xt-Yt(o'^/2 • r) computed for both cases. Also 
in Figure 7, the power spectral densities of the correlated noise 
samples Wt (see [19], p. 408) are shown for both cases. It is 
apparent that the truncated MLM detector performs better (i.e. 
smaller symbol error probability) when the correlation coeffi- 
cient IE \Wt ■ l^t-i-i} l<P — —0.5. This is explained intuitively 
as follows. The detector should be able to tolerate more noise 
in the signaling frequency region. Observe the PR2 frequency 
response [18], [15] displayed in Figure 7. When the correlation 
coefficient equals E \Wt ■ W^t+i } /f'^ — —0.5, the noise power 
is strongest amongst signaling frequencies, and the symbol 
error probability Pr{i?t ^ At} = Pi{Xt >Yt} is observed 
to be the lowest (approximately 8 x 10~^). On the other hand 
when the correlation coefficient is E {Wt ■ Wt+i} ja^ = 0.5, 
the noise is strongest at frequencies near the spectral null of 
the PR2 channel, and the (error) probability Pr{Xf >Yt} is 
the highest (approximately 1.6 x 10~^). Note that in the latter 
case E{Wt • Wt+i} /a'^ = —0.5, the MLM performs even 
better than the i.i.d case, see Figure 7. In the i.i.d case, the 
error probability Pr {Xt > rj w 1.3 x 10"^ 

Remark 8. One intuitively expects that similar observations 
will be made even for other (more complicated) choices for 
the noise covariance matrix ^w> recall (22). We stress that 
our results are general in the sense that we may arbitrarily 
specify ^w> even if the noise samples Wt are non-stationary 
our methods still apply. 
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Fig. 7. Marginal distribution Fxt—Yti'^'^/'^ ' ^) for correlated noises, for the PR2 channel, at SNR 5 dB. Truncation length m = 5. This figure suggests 
that the jrt-truncated MLM tolerates more noise in the frequency region where the signal power is high. 




Fig. 8. Marginal distributions Fxt—Yt (""^/^ ■ '') computed for cases when a run-length limited (RLL) code is present. Here, we compare both the PR4 and 
dicode (see Table 1) channels at SNR 5 dB. The PR4 channel has a spectral null Nyquist frequency, but the dicode channel does not. We see how a simple 
RLL code, which prevents neighboring transitions, aids channels with spectral nulls at Nyquist frequency. 



D. Marginal distribution Fxt-Ytic'^ /'^ ■ f) when the noise is 
i.i.d., and when run-length limited (RLL) codes are used. 

We demonstrate Procedure 3 in Subsection III-C, used to 
compute the distribution Fxt-Yt (f ^/2 • r) when a modulation 
code is present in the system. In particular, consider a run- 
length limited (RLL) code; we test the simple RLL code that 
prevents neighboring symbol transitions [14], [15]. This code 
improves transmission over ISI channels, that have spectral 
nulls near the Nyquist frequency [15]; one such channel is the 
PR4, see Table I. Figure 8 shows Fx^-Yt (c^/S • r) computed 
for both the PR4, as well as the dicode channel, see Table 

1. The PR4 channel has a spectral null at Nyquist frequency 
(recall Subsection V-C), but the dicode channel does not. 

It is clearly seen from Figure 8 that the RLL code improves 
the performance when used on the PR4 channel. For the PR4 
channel, the distribution Fxt^yt(<T^/2 • r) appears to concen- 
trate more probability mass over negative values of Xt — Yt 
(similar to the observations made in Figure 5 when there 
is a SNR increase). The error probability Pi {Bt ^ At] = 
Vr{Xt >Yt] = 1 — Fxt-Yt{0) decreases by a factor of 

2, dropping from approximately 9.5 x 10^^ to 4 x 10^^. 
On the other hand, the RLL code worsens the performance 
when applied to the dicode channel. For the dicode channel. 



Fxt-Yt{r) concentrates more probability mass over positive 
values of Xt — Yt (similar to the observations made in Figure 
5 when there is an SNR decrease), and the (error) probability 
Pi{Xt > Yt} increases from approximately 8.8 x 10^^ to 
L35 X IQ-i. 

E. Marginal distribution of 2/a'^ ■ (Xt — Yt), when con- 
ditioning on neighboring error events {Bt-i ^ At-i} and 
{Bt+,^At+i} 

Here we consider three neighboring symbol reliabilities, 
i.e. we consider R^s — [Rt^i, Rt, Rt+i]'^ ■ We consider the 
following two conditional distributions : 

(a) Pv{Xt -Yt< r\Xt-i < Yt-i,Xt+i < Yt+i} 
^ ^ ■ Fx^,-Y^si0.r,0), and 

(b) Pr{Xt -Yt< r\Xt-i > Yt^^Xt+i > Yt+i} 
^ [Fx.-Y^r) - Fx^,-Y^Ar,0) 



Co 



-Fx^,-Y^AO,r) + Fx^,-Y^AO,r,0) 



where the normalization constants Ci and C2 equal the proba- 
bilities of the (respective) events that were conditioned on. Dis- 
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Fig. 9. Marginal distributions of Xt — Yt computed for the PRl channel, ob- 
tained when conditioning on either events {Bt—i ^ At—i, -Bt+l 7^ ^t+l} 
and {Bt-i = At-i, Bt+l = ^t+i}- These two events correspond to error 
(or non-error) events at neighboring time instants i — 1 and t + 1. The solid 
black line represents the unconditioned marginal distribution of Xt — Yt . 



tribution (a) is conditioned on the event that both neighboring 
symbols are correct, i.e. {Bt-i — At-i, Bt+i = At+i}. Dis- 
tribution (b) is conditioned on the event that both neighboring 
symbols are wrong, i.e. {-Bt-i j^ At-i,Bt+i j^ At+i}. For 
the PRl, PR2 and PR4 channels, both conditional distributions 

(a) and (b) are shown in Figures 9 and 10. We compare two 
different SNRs 3 and 10 dB. For comparison purposes, we 
also show the unconditioned distribution Fx^^-Y^n (c^/S • r) 
in both Figures 9 and 10. We make the following observations. 

In all considered cases, distribution (a) is seen to be similar 
to the unconditioned distribution. However, distribution (b) is 
observed to vary for all the considered cases. Take for example 
the PR2 channel, we see from Figure 10 that distribution 

(b) has probability mass concentrated to the right of the 
unconditioned Fxt-Yt (o'^/2 • r). This is true for both SNRs 3 
and 10 dB. In contrast for the PRl, the MLM detector behaves 
differently at the two SNRs. We see from Figure 9 that at 
SNR 10 dB, the distribution (b) has a lower symbol error 
probability than that of the unconditioned i^Xt-yt(o'^/2 • r). 
At SNR 3 dB however, the opposite is observed, i.e. the 
symbol error probability is higher than that of the distribution 
Fxt-Yt (c^/S • r). This is because at SNR 10 dB, errors occur 
sparsely, interspaced by correct symbols; it is uncommon to 
encounter consecutive symbols in error. Hence conditioned on 
adjacent symbols Bt-i and Bt+i being wrong, it is uncommon 
for Bt to be also wrong, as this is the event where we have 
three consecutive errornous symbols. Finally, the observations 
made for the PR4 channel, is again different. We notice that 
both distributions (a) and (b) always (practically) equal the 
unconditioned distribution Fxj-yt(cr^/2 • r). This is because 
the even/odd output subsequences of the PR4 channel are 
independent of each other. 

VI. Conclusion 

In this paper, we derived closed-form expressions for both 
i) the reliability distributions Fx^n-Yt^i^''^ /"^ • r), and ii) 



the symbol error probabilities Pr{n"=i {Bti 7^ ^ti}}^ for the 
TO-truncated MLM detector Our results hold jointly for any 
number n of arbitrarily chosen time instants ii,i2,'-' ,^n- 
The general applicability of our result has been demonstrated 
for a variety of scenarios. Efficient Monte-Carlo procedures 
that utilize dynamic programming simplifications have been 
given, that can be used to numerically evaluate the closed- 
form expressions. 

It would be interesting to further generalize the exposition 
to consider infinite impulse response (IIR) filters, such as in 
convolutional codes. 

Appendix 

A. Computing the matrix Q = Q(At") in Definition 8 

In this appendix, we show that the size 2mn square 
matrix Q with both properties i) and ii) as stated in 
Definition 8, can be easily found. We begin by noting 
from (24) that rank(SS-^) = 2m, therefore the matrix 
I„ (g) SS^ has rank 2mn and is positive definite. Recall 
diag {G{Ati),G{At2), ■ ■ ■ , G{At^)) is block diagonal with 
entries (20). 

Lemma 3. Let S be given as in Definition 6. Let the size 2mn 
by 2mn square matrix a diagonalize 



a^(I„ ® SS'^)a = I. 



(63) 



Let ^ be the size 2mn by 2mn eigenvector matrix P in the 
following decomposition 

a^(I„ ® SS^)cliag {G{At,),G{At,), ■■■ , G{Atjf Kw 



diag(G(A,J,G(A,J, 



,G(A*J)(I„®SS^)a 



(64) 



and A? is the eigenvalue matrix of (64), therefore h? in (64) 
is diagonal of size 2mn. Then 

Q=a^ (65) 

satisfies both properties i) and ii) stated in Definition 8. 

Proof: Because a diagonalizes I„ (g) SS^ to an identity 
matrix I, it follows that a must have full rank, and thus have an 
inverse a^^. It follows from (63) that a^^ — a"^(I„ (8) SS-^). 
Replacing a^(I„(8)SS^) = a ^ in (64), we see that^S satisfies 



a-'dia.g{GiAt,),G{At. 



,G{AtjyKw 



•diag(G(A,J,G(A,J,-..,G(A,J)a-^-/3A2^^. 

(66) 

Consider the matrix Q = afi. It follows from (66) that 
Q = a/3 satisfies property i) in Definition 8, as seen after 
multiplying (the matrices satisfying) (66) on the left and right 
by a and a^, respectively. It also follows that Q = afi 
satisfies property ii) in Definition 8, this is because 

Q^(I„ «) SS^)Q = ^^a^(I„ «) SS'^)a0 = 0'^j3 = I, 

where the last equality follows because /3 is unitary (i.e. 0^^ — 
0^) by virtue of the fact that it is an eigenvector matrix [17], 
p. 311. ■ 
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Fig. 10. Marginal distributions of Xt — Yt computed for both the PR2 and PR4 channels, obtained when conditioning on either events 

{Bt—i ^ At—i,BtJ,-i j^ At+i} and {Bt-i = At—i,Bt+i = At+i}. These two events correspond to error (or non-error) events at neighboring time 
instants i — 1 and t + 1. The soUd black line represents the unconditioned marginal distribution of 2/(T-^ ■ (Xt — Yt). 



[9] 



To summarize Lemma 3, the matrix Q — Q(j4t") in 
Definition 8, is obtained by first computing two size 211171 
matrices a and respectively satisfying (63) and (64), and 
then setting Q = a0. The matrix fi is obtained from an 
eigenvalue decomposition of the 2mn matrix (64), and clearly 
/3 depends on the symbols At". The matrix a however, is 
simpler to obtain. This is due to the simple form of SS^ in [11] 
(24), and we may even obtain closed form expressions for a, 
see the next remark. 

Remark 9. It can be verified that the following are eigenvec- 
tors of the matrix SS"^ in (24). The first 2m — 1 eigenvectors 
are 

i 2m-(i+l) 



where i can take values 1 < z < 2ni, and the last eigenvector 
is simply l/|l| = l/(2m). 
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